A Hybrid Model for Phrase Chunking Employing Artificial Immunity System and Rule Based Methods
نویسنده
چکیده
Natural language Understanding (NLU), an important field of Artificial Intelligence (AI) is concerned with the speech and language understanding between human and computer. Understanding language means knowing what concept a word or phrase stands for and how to link them to form meaningful sentence. Identification of phrases or phrase chunking is an important step in natural language understanding (NLU). Chunker identifies and divides sentences into syntactically correlated word groups. Question Answering (QA) systems, another important application of Artificial Intelligence (AI) mostly requires retrieval of nouns or noun phrases as answers to the questions raised by the users. Also Chunking is an important preprocessing step in full parsing. Due to high ambiguity of natural language, exact parsing of text may become very complex. This ambiguity may be partially resolved by using chunking as an intermediate step. To the best of our knowledge no known work or tag set is available for phrase chunking in Malayalam. To separate the chunks in a document it must be labeled with parts-ofspeech (POS) tags. POS Tagging is a difficult task in Malayalam as it is a complex and compounding language. In this paper we describe the application of artificial immunity system (AIS) for chunking which is implemented and obtained an accurate output with 96% precision and 93% recall. This system is tested on corpuses collected from reputed news papers and magazines. These corpuses contained documents from five different domains such as sports, health, agriculture, science and politics and each document contained sentences –simple, compound, complex-of various levels of complexity. POS tag set with 52 tags is developed for preparing the tagged corpus for Malayalam. The phrase tag set contains 20 phrase tags.
منابع مشابه
A hybrid approach to Urdu verb phrase chunking
A variety of verb phrases exist in Urdu including simple verb phrases, conjunct verb phrases and compound verb phrases. This paper explains the structure of Urdu verb phrases, and details a series of experiment to automatically tag them. Initially, a rule based model is developed using 21 linguistic rules for automatic VP chunking. A 100,000 word Urdu corpus is manually tagged with VP chunk tag...
متن کاملPhrase Chunking Using Entropy Guided Transformation Learning
Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better r...
متن کاملتعیین مرز و نوع عبارات نحوی در متون فارسی
Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...
متن کاملEnhancing the Performance of Part of Speech tagging of Nepali language through Hybrid approach
Part-of-speech tagging is the process of marking up the words in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context —i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. Part-of-Speech (POS) tagging is the process of assigning the appropriate part of speech or lexical category to each word in a ...
متن کاملA Grammar Checking System for Punjabi
This article provides description about the grammar checking system developed for detecting various grammatical errors in Punjabi texts. This system utilizes a fullform lexicon for morphological analysis, and applies rule-based approaches for part-of-speech tagging and phrase chunking. The system follows a novel approach of performing agreement checks at phrase and clause levels using the gramm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011